Installation - ChartsMaze EDL Pipeline

Prerequisites

The EDL Pipeline requires a Unix-like environment (Linux, macOS, or WSL on Windows) with Python 3.7+.

Windows Users: Use WSL (Windows Subsystem for Linux) or Git Bash. Native Windows Command Prompt may have issues with curl commands and path handling.

System Requirements

Python Version

Python 3.7 or higher (tested on 3.8-3.11)

Disk Space

Minimum 500 MB free (2 GB recommended for OHLCV data)

Network

Stable internet connection (pipeline fetches 30+ MB of data)

Memory

4 GB RAM minimum (8 GB recommended)

Installation Steps

Verify Python Installation

Check that Python 3 is installed:

python3 --version

Expected output:

Python 3.8.10

If Python is not installed, download from python.org or use your system’s package manager:

# macOS (Homebrew)
brew install python3

# Ubuntu/Debian
sudo apt update && sudo apt install python3 python3-pip

# Fedora/RHEL
sudo dnf install python3 python3-pip

Install Python Dependencies

The pipeline requires three core Python packages:

pip3 install requests pandas beautifulsoup4

Or use a requirements file:

requirements.txt

requests>=2.28.0
pandas>=1.5.0
beautifulsoup4>=4.11.0

pip3 install -r requirements.txt

Dependency Details

Package	Version	Purpose
requests	>=2.28.0	HTTP client for API calls to Dhan, NSE endpoints
pandas	>=1.5.0	OHLCV data processing, CSV parsing (NSE listings)
beautifulsoup4	>=4.11.0	HTML parsing for surveillance lists (Google Sheets fallback)

Verify Installation

Confirm all dependencies are installed:

python3 -c "import requests, pandas, bs4; print('All dependencies OK')"

Expected output:

All dependencies OK

Locate the Pipeline Directory

Navigate to the EDL Pipeline source code:

cd "~/workspace/source/DO NOT DELETE EDL PIPELINE"

Verify the master runner script exists:

ls -l run_full_pipeline.py

DO NOT DELETE or RENAME this directory. The folder name is intentionally explicit to prevent accidental removal. All pipeline scripts use relative paths and expect to run from this directory.

Verify Directory Structure

The pipeline directory should contain these core scripts:

ls -1 *.py

Expected output (18 core scripts):

run_full_pipeline.py              # Master runner
fetch_dhan_data.py                # Phase 1: Core data
fetch_fundamental_data.py         # Phase 1: Fundamentals
fetch_company_filings.py          # Phase 2: Filings
fetch_new_announcements.py        # Phase 2: Announcements
fetch_advanced_indicators.py      # Phase 2: Indicators
fetch_market_news.py              # Phase 2: News
fetch_corporate_actions.py        # Phase 2: Corporate actions
fetch_surveillance_lists.py       # Phase 2: ASM/GSM
fetch_circuit_stocks.py           # Phase 2: Circuits
fetch_bulk_block_deals.py         # Phase 2: Bulk deals
fetch_incremental_price_bands.py  # Phase 2: Price bands
fetch_complete_price_bands.py     # Phase 2: Price bands
fetch_all_ohlcv.py                # Phase 2.5: OHLCV
bulk_market_analyzer.py           # Phase 3: Base JSON
advanced_metrics_processor.py     # Phase 4: Metrics
process_earnings_performance.py   # Phase 4: Earnings
enrich_fno_data.py                # Phase 4: F&O data
add_corporate_events.py           # Phase 4: Events (LAST)

Optional/Standalone Scripts

These scripts are NOT part of the main pipeline but can be run manually:

fetch_all_indices.py          # 194 market indices
fetch_etf_data.py             # 361 ETFs
fetch_fno_data.py             # 207 F&O stocks
fetch_fno_lot_sizes.py        # F&O lot sizes
fetch_fno_expiry.py           # Expiry calendar
single_stock_analyzer.py      # Single stock inspector
pipeline_utils.py             # Shared utilities

Test Run (Dry Run)

Verify the pipeline can start without errors:

python3 -c "import run_full_pipeline; print('Pipeline module loaded successfully')"

Or run a quick test with a single script:

python3 fetch_dhan_data.py

This should create two files:

dhan_data_response.json (~5 MB)
master_isin_map.json (~200 KB)

Verify:

ls -lh dhan_data_response.json master_isin_map.json

Directory Structure After First Run

After running the pipeline once, your directory will look like this:

DO NOT DELETE EDL PIPELINE/
├── run_full_pipeline.py
├── fetch_*.py (18 scripts)
├── all_stocks_fundamental_analysis.json.gz  # PRIMARY OUTPUT (2-4 MB)
└── ohlcv_data/                              # OHLCV cache (if FETCH_OHLCV = True)
    ├── RELIANCE.csv
    ├── TCS.csv
    └── ... (2,775 CSV files)

Recommended: Keep CLEANUP_INTERMEDIATE = True to save disk space. The compressed output contains all data needed for analysis.

Network Configuration

The pipeline makes HTTP requests to multiple endpoints:

Endpoint List (expand to see all)

Endpoint	Purpose	Rate Limit
`ow-scanx-analytics.dhan.co`	Full market scan, corporate actions	Thread pool: 1
`open-web-scanx.dhan.co`	Fundamental data	Thread pool: 1
`ow-static-scanx.dhan.co`	Filings, announcements, indicators, deals	Thread pool: 15-50
`news-live.dhan.co`	Real-time news feed	Thread pool: 15
`openweb-ticks.dhan.co`	OHLCV historical data	Thread pool: 15
`nsearchives.nseindia.com`	Listing dates, price bands	Direct curl
Google Sheets (fallback)	Surveillance lists	Direct requests

Firewall/Proxy Users: Ensure outbound HTTPS (port 443) is allowed for:

*.dhan.co
nsearchives.nseindia.com
docs.google.com (for surveillance list fallback)

Validation Checklist

Before running the full pipeline, verify:

Python Dependencies

python3 -c "import requests, pandas, bs4; print('✅ All dependencies OK')"

Network Connectivity

curl -s -o /dev/null -w "%{http_code}" https://ow-scanx-analytics.dhan.co

Expected: 200 or 405 (endpoint exists)

Disk Space

df -h . | tail -1 | awk '{print $4 " available"}'

Ensure at least 500 MB free (2 GB if using OHLCV)

Write Permissions

touch test.json && rm test.json && echo "✅ Write permission OK"

Troubleshooting Installation

ModuleNotFoundError: No module named 'requests'

Cause: Dependencies not installed in the correct Python environment.Solution:

# Ensure you're using the same python3 binary
which python3

# Install with explicit python3 pip
python3 -m pip install requests pandas beautifulsoup4

# Verify installation
python3 -m pip list | grep -E '(requests|pandas|beautifulsoup4)'

Permission Denied when running scripts

Cause: Scripts lack execute permissions.Solution:

# Make scripts executable
chmod +x *.py

# Or run with python3 explicitly
python3 run_full_pipeline.py

curl: command not found (NSE CSV download fails)

Cause: curl not installed.Solution:

# macOS: curl is pre-installed
# Ubuntu/Debian:
sudo apt install curl

# Fedora/RHEL:
sudo dnf install curl

# Verify:
curl --version

Impact if not fixed: Listing dates will be missing, but pipeline will continue (non-critical).

SSL Certificate Verification Failed

Cause: Corporate proxy or outdated CA certificates.Solution:

# Update CA certificates
# Ubuntu/Debian:
sudo apt update && sudo apt install ca-certificates

# macOS:
/Applications/Python\ 3.X/Install\ Certificates.command

# Or temporarily disable SSL verification (NOT RECOMMENDED for production):
# Add to fetch scripts:
# response = requests.post(url, json=payload, headers=headers, verify=False)

Timeout errors during OHLCV fetch

Cause: Slow network or rate limiting.Solution:

# First run: Expect 30-40 min for lifetime OHLCV download
# If timing out repeatedly, increase timeout in fetch_all_ohlcv.py (line ~50):
# timeout=30 → timeout=60

# Or skip OHLCV for faster pipeline:
# Edit run_full_pipeline.py:
FETCH_OHLCV = False

Virtual Environment (Optional but Recommended)

To isolate dependencies from system Python:

Create Virtual Environment

python3 -m venv edl-env

Activate Environment

# Linux/macOS:
source edl-env/bin/activate

# Windows (WSL):
source edl-env/bin/activate

Your prompt should now show (edl-env).

Install Dependencies

pip install requests pandas beautifulsoup4

Run Pipeline

python run_full_pipeline.py

Deactivate (when done)

deactivate

Next Steps

Quick Start Guide

Run your first pipeline and explore the output

Pipeline Settings

Customize pipeline behavior

Pipeline Architecture

Understand the pipeline phases

Field Reference

Complete guide to all 86 output fields

​Prerequisites

​System Requirements

Python Version

Disk Space

Network

Memory

​Installation Steps

​Directory Structure After First Run

​Network Configuration

​Validation Checklist

​Troubleshooting Installation

​Virtual Environment (Optional but Recommended)

​Next Steps

Quick Start Guide

Pipeline Settings

Pipeline Architecture

Field Reference

Prerequisites

System Requirements

Installation Steps

Directory Structure After First Run

Network Configuration

Validation Checklist

Troubleshooting Installation

Virtual Environment (Optional but Recommended)

Next Steps